138 research outputs found

    Large-Scale Plant Classification with Deep Neural Networks

    Full text link
    This paper discusses the potential of applying deep learning techniques for plant classification and its usage for citizen science in large-scale biodiversity monitoring. We show that plant classification using near state-of-the-art convolutional network architectures like ResNet50 achieves significant improvements in accuracy compared to the most widespread plant classification application in test sets composed of thousands of different species labels. We find that the predictions can be confidently used as a baseline classification in citizen science communities like iNaturalist (or its Spanish fork, Natusfera) which in turn can share their data with biodiversity portals like GBIF.Comment: 5 pages, 3 figures, 1 table. Published at Proocedings of ACM Computing Frontiers Conference 201

    DeepKey: Towards End-to-End Physical Key Replication From a Single Photograph

    Get PDF
    This paper describes DeepKey, an end-to-end deep neural architecture capable of taking a digital RGB image of an 'everyday' scene containing a pin tumbler key (e.g. lying on a table or carpet) and fully automatically inferring a printable 3D key model. We report on the key detection performance and describe how candidates can be transformed into physical prints. We show an example opening a real-world lock. Our system is described in detail, providing a breakdown of all components including key detection, pose normalisation, bitting segmentation and 3D model inference. We provide an in-depth evaluation and conclude by reflecting on limitations, applications, potential security risks and societal impact. We contribute the DeepKey Datasets of 5, 300+ images covering a few test keys with bounding boxes, pose and unaligned mask data.Comment: 14 pages, 12 figure

    Maximizing CNN Accelerator Efficiency Through Resource Partitioning

    Full text link
    Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x

    Relay: A New IR for Machine Learning Frameworks

    Full text link
    Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet

    Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN

    Full text link
    Discriminative localization is essential for fine-grained image classification task, which devotes to recognizing hundreds of subcategories in the same basic-level category. Reflecting on discriminative regions of objects, key differences among different subcategories are subtle and local. Existing methods generally adopt a two-stage learning framework: The first stage is to localize the discriminative regions of objects, and the second is to encode the discriminative features for training classifiers. However, these methods generally have two limitations: (1) Separation of the two-stage learning is time-consuming. (2) Dependence on object and parts annotations for discriminative localization learning leads to heavily labor-consuming labeling. It is highly challenging to address these two important limitations simultaneously. Existing methods only focus on one of them. Therefore, this paper proposes the discriminative localization approach via saliency-guided Faster R-CNN to address the above two limitations at the same time, and our main novelties and advantages are: (1) End-to-end network based on Faster R-CNN is designed to simultaneously localize discriminative regions and encode discriminative features, which accelerates classification speed. (2) Saliency-guided localization learning is proposed to localize the discriminative region automatically, avoiding labor-consuming labeling. Both are jointly employed to simultaneously accelerate classification speed and eliminate dependence on object and parts annotations. Comparing with the state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach achieves both the best classification accuracy and efficiency.Comment: 9 pages, to appear in ACM MM 201

    Improving neural networks by preventing co-adaptation of feature detectors

    Full text link
    When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition

    Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices

    Full text link
    Convolutional Neural Networks (CNNs) have revolutionized the research in computer vision, due to their ability to capture complex patterns, resulting in high inference accuracies. However, the increasingly complex nature of these neural networks means that they are particularly suited for server computers with powerful GPUs. We envision that deep learning applications will be eventually and widely deployed on mobile devices, e.g., smartphones, self-driving cars, and drones. Therefore, in this paper, we aim to understand the resource requirements (time, memory) of CNNs on mobile devices. First, by deploying several popular CNNs on mobile CPUs and GPUs, we measure and analyze the performance and resource usage for every layer of the CNNs. Our findings point out the potential ways of optimizing the performance on mobile devices. Second, we model the resource requirements of the different CNN computations. Finally, based on the measurement, pro ling, and modeling, we build and evaluate our modeling tool, Augur, which takes a CNN configuration (descriptor) as the input and estimates the compute time and resource usage of the CNN, to give insights about whether and how e ciently a CNN can be run on a given mobile platform. In doing so Augur tackles several challenges: (i) how to overcome pro ling and measurement overhead; (ii) how to capture the variance in different mobile platforms with different processors, memory, and cache sizes; and (iii) how to account for the variance in the number, type and size of layers of the different CNN configurations
    • …
    corecore